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(57) Abstract 

A method and system (10) provides a face-to-face video conference utilizing a video mirror comprising a first station having a first 
predetermined sensory setting; a second station having a second predetermined sensory setting; and an imaging system for capturing an 
image or sub-image at the first station, displaying at least a portion of said image or sub-image at the second station. The imaging system 
includes a differentiator for generating a differential signal (101) in response to a comparison of a differential reference signal to an input 
signal generally corresponding to the image captured at the first station. The imaging system also include a compositor for compositing the 
differential signal with one or more other image signals and/or a predetermined composite signal to provide a composited video image (97) 
which appears visually contiguous and seamless. 
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TELECONFERENCING METHOD AND SYSTEM 

Backgrou nd of the Invention 
The present invention is related to a video 
conferencing system and method and, more particularly, to 
5 a teleconferencing system which is capable of producing a 
M video mirror" at a station such that any participants at 
one or more remote stations may be imaged and displayed 
in the video mirror at the station so that they appear to 
be present or face-to-face with any participants at the 
10 station. 

Visual telephone systems presently provide 
communication between at least two locations for allowing 
a video conference among participants situated at each 
station. An objective in some video conferencing 
15 arrangements is to provide a plurality of television 
cameras at one location. The outputs of those cameras 
are transmitted along with audio signals to a 
corresponding plurality of television monitors at a 
second location such that the participants at the first 

20 location are perceived to be present or face-to-face with 
participants at the second location. In achieving good 
face-to-face presence, the number of conferees included 
in the video picture from each camera is normally limited 
to a few people, typically one to four. There are 

25 usually a like number of monitors at the receiving 
station, each strategically focused, aligned and 
positioned so that their displays appear contiguous, 
seamless and properly aligned. The apparatuses and 
methods employed heretofore to achieve proper 

30 positioning,, focus and alignment have been complex and 
costly . 

Further, the images captured by the plurality 
of cameras must be arranged and displayed so that they 
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generate a non-overlapping and/or contiguous field of 
view, for example, as described in U. S. Patent No. 
4,809,314 which issued to Judd et al . on December 26, 
1989 and which is hereby incorporated by reference and 
5 made a part hereof. 

The prior art systems have also been deficient 
because they have failed to provide means for generating 
an image, such as an image of a plurality of 
participants, at one station, differentiating the image 
10 to provide a differentiated image and subsequently 

compositing the differentiated image with a predetermined 
composite image to provide a composited image which 
complements or becomes visually complementary, contiguous 
or integrated with the remote station when the image is 
15 displayed at the remote station. 

Another problem with prior art video 
conferencing systems is eye contact among participants at 
the stations. Typically, a camera is placed somewhere 
above the display monitor at which a participant is 
20 observing a display of the participant from the remote 
station. Consequently, the camera captures the 
participant at an angle above the participants viewing 
level or head. Thus, when an image of that participant 
is displayed at the remote station, it appears as if the 
25 participant is looking down (e.g., towards the ground). 
Previous solutions to this problem have required complex 
optical systems and methods using, for example, a 
plurality of lenses and mirrors. The solutions have 
usually been designed for use when the camera is 
30 capturing an image of a single participant, and they fall 
short when simultaneously capturing images of multiple 

participants . 

The prior art stations themselves were not 
architecturally designed in a modular form so that they 
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could be easily assembled, decorated and combined with a 
video image or sub-image from the remote station in a 
manner which would enhance the virtual presence 
environment . 

5 Summary of the Invention 

It is, therefore, a primary object of the 
present invention to provide a face-to-face 
teleconferencing system which enables a plurality of 
participants at a plurality of stations to teleconference 
10 such that the participants generally appear face-to-face 
with one or more participants at remote stations in the 
teleconferencing system. 

Another object of this invention is to provide 
a differentiator or differentiating means which 
15 facilitates differentiating at least one image captured 
at a station into a differentiated image which will 
ultimately be transmitted to at least one remote station. 

Another object of this invention is to provide 
a method and system for compositing an image or sub-image 
20 received from a remote station with a predetermined 

composite image to provide a composited image, at least a 
portion of which is displayed at the station. 

Still another object of the invention is to 
provide a system or method which provides a display 
25 having wide aspect ratio while utilizing cameras which 
generate images having smaller aspect ratios. 

Still another object of the invention is to 
provide a method and system for defining a predetermined 
sensory setting at one or more stations in order to 
30 enhance the virtual presence environment at that station. 

Still another object of the present ^invention 
is to provide a method and apparatus for imaging subjects 
at one station, processing such images, and displaying 
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such images at a remote station such that such images 
complement and become and/or become visually integrated 
with the remote station. 

Another object of this invention is to provide 
5 a method and apparatus which is capable of generating a 
composite image having a plurality of different 
resolutions . 

Still another object of the present invention 
is to provide a M video mirror" at a station. 

10 Yet another object of the invention is to 

provide an imaging system which provides a simplified 
means capturing substantially eye level images of 
participants at stations while also providing means for 
simultaneously displaying images at such stations. 

15 Still another object of this invention is to 

provide a system and method for compositing a plurality 
of signals corresponding to a plurality of images from at 
least one station to provide a contiguous or seamless 
composite image. 

20 Still another object is to provide a method and 

system for providing a plurality of teleconferencing 
stations that have complementary predetermined sensory 
settings which facilitate creating a face-to-face 
environment when images of such settings and participants 

25 are displayed at remote stations. 

Another object of the invention is to provide a 
method and apparatus for generating a video mirror such 
that an image having a predetermined sensory setting of 
participants or subjects captured at one station may be 

30 displayed at a remote station having a different 
predetermined sensory setting, yet the remote 
participants will appear face-to-face in the same 
predetermined setting as the participants or subjects at 
the one station. 



WO 96/09722 




PCT/US95/11802 



- 5 - 



In one aspect, this invention comprises an 
image generator for use in a teleconferencing system 
comprising a differentiator for comparing a differential 
reference image to an input video image from a station 
5 and for generating a differential image in response 
thereto, and a compositor associated with a remote 
station for receiving the differential image and for 
combining that differential image with a predetermined 
composite image to provide a composite image. 

10 In another aspect, this invention comprises a 

conferencing system comprising a first station comprising 
a first sensory area defining a first aura, a 
second station comprising a second sensory area defining 
a second aura, and an image system for generating a first 

15 station image of at least a portion of the first sensory 
area and also for displaying a composite image 
corresponding to the first station image at the second 
station such that the first and second auras become 
visually combined to provide an integrated face-to-face 

20 environment at the second station. 

In another aspect, this invention comprises an 
image system for use in a conference environment 
comprising a station having a first conference area and a 
remote station having a remote video area, the image 

25 system comprising a compositor for compositing a first 

signal which generally corresponds to a video image of a 
portion of the first conference area with a composite 
reference signal to provide a composite image signal; and 
a display for displaying the composited image signal at 

30 the remote video area such that the first and second 
stations appear complementarily integrated. 

In still another aspect, of the invention, this 
invention comprises a teleconferencing system comprising 
a sensory setting, a second station having a second 
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predetermined sensory setting; and an imaging system for 
capturing an image at the first station and displaying at 
least a portion of the image at the second station such 
that it becomes generally visually integrated with the 
5 second predetermined sensory setting. 

In another aspect of this invention, this 
invention comprises a station for use in a 
teleconferencing environment comprising a first station 
predetermined setting, first image sensing means 
10 associated with the first station predetermined setting 
for capturing images at the station for transmission to a 
remote station, audio means for transmitting and/or 
receiving audio signals from at least one remote station, 
and display means for displaying an image including at 
15 least one sub-image transmitted to the station from the 
remote station so that the image becomes integrated with 
the first station predetermined setting to facilitate 
providing a face-to-face presence teleconference. 

In still another aspect of the invention, this 
20 invention comprises a method for providing a virtual 

presence conference in a teleconferencing system having a 
first station and a second station comprising the step of 
displaying an image formed from at least one sub-image 
from the first station at a predetermined location in the 
25 second station such that the image becomes visually 
integrated with the second station to define a single 
predetermined aura at the second station. 

In yet another aspect of the invention, this 
invention comprises a method for teleconferencing 
30 comprising the steps of teleconnecting a first station 
having a first setting to a second station having a 
second setting; and displaying a composite image 
including an image of at least a portion of the first 
station at the second station such that when the 
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composite image is displayed at the second station it 
cooperates with the second setting to facilitate 
providing a face-to-face environment at the second 
station . 

5 In still another aspect, this invention 

comprises a method for teleconferencing comprising 
generating at least one first station signal generally 
corresponding to a first station image of the first 
station, comparing the at least one first station signal 
10 to a differential reference signal corresponding to a 
first reference image and generating at least one 
differential signal comprising a portion of the first 
station image in response thereto, compositing the at 
least one differential signal with a predetermined 
15 composite signal corresponding to a predetermined image 
to provide at least one composite image, and displaying 
the at least one composite image corresponding to the 
composite signal at a second station. 

In yet another aspect, this invention comprises 
20 a method for generating a seamless image at a station 
from a plurality of sub-images at least one of which is 
received from a remote station comprising the steps of 
generating the plurality of sub-images, and combining the 
plurality of sub-images with a predetermined composite 
25 image to provide the seamless image. 

These advantages and objects, and others, may 
be more readily understood in connection with the 
following specification, claims and drawings. 

Brief Description of the Accompanying Drawings 
30 Figs. 1A and IB, taken together, show a 

teleconferencing system according to one embodiment of 
this invention; 
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Fig. 2 is a partly broken away top view of a 
first station of the teleconferencing system shown in 
Fig. 1A; 

Figs. 3A and 3B, taken together, show another 
5 embodiment of the present invention wherein the stations 
have different predetermined sensory settings; 

Figs. 4A and 4B, taken together, show still 
another embodiment of the invention having stations which 
have predetermined sensory settings which are designed, 
10 decorated and defined to be complementary and/or 
substantially identical ; 

Figs. 5 A and 5B, taken together, provide a 
visual illustration of the images corresponding to some 
of the signals generated by the teleconferencing system; 
1 5 and 

Figs. 6A-6D, taken together, show a schematic 
diagram of a method according to an embodiment of this 
invention . 

Detailed Des cription of Preferred Embodiment 
20 Referring now to Figs. 1A and 1B, a 

teleconferencing system 10 is shown having a first 
station or suite 12 and a second station or suite 14. 
The first station 12 comprises a first conference or 
sensory area 16, and the second station 14 comprises a 
25 second conference or sensory area 18-1, respectively. 
The first and second stations 12 and 14 also comprise a 
first video area 20 and a second video area 22-1 , 
respectively, associated with the first and second 
conference areas 16 and 18-1. The first video area 20 is 
30 generally integral with a wall 32h in the first station 
12. Likewise, the second video area 22-1 is generally 
integral with a wall 32h-1 in the second station 14. In 
the embodiment being described, the first and second 
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stations are geographically remote from each other, but 
they could be situated on the same premises if desired. 

For ease of illustration, the construction and 
modular assembly of the stations in teleconferencing 
5 system 10 will be described in relation to the first 
station 12. As shown in the sectional top view of Fig. 
2, the first station 12 is shown assembled or constructed 
into a generally elongated octagonal shape. The first 
station 12 comprises a plurality of modular members 32a- 
10 32h which include walls 32a, 32c-e, 32g-h, doors in wall 
members 32b and 32f and entry facade 32f-1 . The first 
station 12 also comprises a ceiling 34 (Fig. 1A) which is 
mounted on the members 32a-32h with suitable fasteners, 
such as nuts, bolts, adhesives, brackets, or any other 
15 suitable fastening means. Notice that the ceiling 34 has 
a dropped or sunken portion 34a which supports 
appropriate lighting fixtures 56. 

In the embodiment being described, each of the 
members 32a-32h and the ceiling 34 is molded or formed to 
20 provide or define an environment having a unique 
architectural setting and/or sensory setting. For 
example, as illustrated in Fig. 1A, the wall member 32a 
may be formed to provide a plurality of stones 36, a 
plurality of columns 38, and an arch 40 to facilitate 
25 defining a first predetermined setting 12a having a 

Roman/Italian motif, theme or aura. One or more of the 
members 32a-32h may be provided with inlays, wall 
decorations (like picture 58 in Figs. 1A and 2), or even 
a permanent frosted glass window and frame arrangement 42 
30 mounted therein. Furthermore, members 32b and 32f (Fig. 
2) may be provided with sliding doors 44 which facilitate 
entering and exiting the first station 12 and which are 
designed to complement or further enhance the 
Roman/Italian motif. 
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In the embodiment being described, notice that 
member 32h (Figs. 1A and 2) is formed to provide a stone 
and pillar appearance and texture complementary to the 
stone and pillar appearance and texture of the wall 
5 members, such as member 32a. Also, the member 32a may be 
shaped to frame or mask a rear projection screen 46, as 
shown. The function and operation of the rear projection 
screen 46 will be described later herein. In the 
embodiment being described, the rear projection screen 46 

10 comprises a high resolution lenticular rear projection 
screen which is either integral with or mounted directly 
to member 32h to provide a first video area 20 having a 
usable projection area of about 52 inches by 92 inches 
with an associated aspect ratio of 16:9. 

15 Each of the members 32a-32h and ceiling 34 are 

created in separate modular units using a plurality of 
molds (not shown). In the embodiment being described, a 
suitable material for molding the members 32a-32h and 
ceiling 34 to provide a granite-like appearance may be 

20 Gypsum, but they could be formed from other suitable 

material such as stone or clay-based materials, ceramic, 
paper, cardboard, foam, wood, Styrofoam and the like. As 
illustrated in 1A and 2, the member 32d may be provided 
with a shelf or mantle 33. The various members 32a-32h 

25 are assembled together as shown in Fig. 2 and secured 
together with suitable support braces 48 which may be 
secured to the walls 32a-32h with any suitable fastener 
such as screws, bolts, an adhesive or the like. After 
the first station 12 is assembled and the ceiling 34 is 

30 secured thereto, it has a length of about 14 feet, 6 
inches (indicated by double arrow L in Fig. 2) and a 
width of about 12 feet, 0 inches (indicated by double 
arrow W in Fig. 2). The first station 12 has an 
approximate height from floor to ceiling 34 of about 8 
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feet, 6 inches. Further, the members 32a, 32c, 32e and 
32g have a width (indicated by double arrow Y in Fig. 2) 
of about 5 feet, 0 inch. Finally, the back wall member 
32d and front wall member 32h comprises a width of about 
5 7 feet, 8 inches (indicated by double arrow X in Fig. 2). 

After the members 32a-32h and ceiling 34 are 
assembled, the first station 12 may be further decorated, 
designed or ornamented with a plurality of subjects, 
decorations or ornaments which facilitate providing the 
10 first predetermined sensory setting 12a which defines a 
first aura, motif or theme. Likewise, the second station 
14 maybe further provided or ornamented with a plurality 
of subjects, decorations or ornaments which facilitate 
providing a second predetermined sensory setting 14a 
15 which defines a second aura, motif or theme. For 

example, as illustrated in Fig. 1A, the predetermined 
sensory setting 12a of the first station 12 may be 
further decorated with a table 50, table decorations, 
pillar and wall decorations, carpet (not shown), plants 
20 54 and other wall decorations (not shown) to further 
enhance the Roman/ Italian motif, theme or aura. The 
first and second predetermined sensory settings 12a and 
14a may also comprise appropriate lighting fixtures 56 
and appropriate furnishings, such as chairs 60 and tables 
25 61, which complement the predetermined setting to further 
facilitate defining the Roman/Italian theme or motif for 
the stations 12 and 14. 

It should be appreciated that once the first 
and second stations 12 and 14 are assembled and 
30 ornamented or decorated to provide their respective first 
and second predetermined sensory settings 12a and 14a, 
they define an aura, theme or motif which facilitates 
providing or creating a very sensual and impressionable 
environment. Providing such a station, such as station 
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12, with a strong sensory environment facilitates 
enhancing the virtual presence illusion created by 
teleconferencing system 10 of the present invention. 

It should also be appreciated, however, that 
5 although the first station 12 and second station 14 are 
shown in the embodiment in Figs. 1A and 1B as having 
complementary or similar first and second predetermined 
sensory settings 12a and 14a, they could be provided with 
first and second predetermined sensory settings 12a and 

10 14a having different themes, motifs or auras. Thus, 
while the embodiment described in relation to Figs. 1A 
and IB illustrate a first and second set of stations 12 
and 14 having a Roman/Italian motif, another set of 
stations, such as station 12' and station 14' in the 

15 embodiment illustrated in Figs. 3A and 3B, may have at 
least one station having a different predetermined 
setting. For example, the second station 1 4' in Fig. 3B 
provides a setting 14a' which defines a Chinese aura, 
theme or motif. 

20 It should also be appreciated that the members 

32a-32h, ceiling 34 and associated predetermined sensory 
setting are provided to be transportable and capable of 
■ being assembled at any suitable location, such as an 
existing rectangular room, suite or conference area 

25 having dimensions of at least 20 feet x 20 feet x 9 feet. 
While it may be desirable to provide the first and second 
stations 12 and 14 in the teleconferencing system 10 with 
substantially the same dimensions, it should be 
appreciated that they could be provided with differing 

30 dimensions, depending on, for example, the number of 
participants at each station. It should also be 
► appreciated that the second station 14 and other stations 
described herein would preferably be manufactured and 
assembled in the same or similar manner as the first 
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station 12. Also, the stations in the teleconference 
system 10 may be decorated with wall, ceiling and floor 
coverings to provide , for example, the first 
predetermined sensory setting 12a without using the pre- 
5 formed or molded modular members 32a-32h described above, 
although the use of such members may be preferable in 
this embodiment. 

The teleconferencing system 10 also comprises 
conferencing means or a conferencing system means for 

10 teleconnecting the first and second stations 12 and 14 
together to facilitate capturing an image or images at 
one of said stations and displaying at least a portion of 
the image or a sub-image at another of the stations such 
that it becomes generally visually integrated with the 

15 predetermined sensory setting at that station, thereby 
facilitating creating a "video mirror" and a "face-to- 
face" environment for the participant situated at that 
station. As shown in Fig. 1A, the conferencing system 
associated with the first station 12 comprises image 

20 sensor means, imager or image sensors for sensing images 
at the first station 12. For the embodiment shown in 
Figs. 1A and 2, the image sensor means comprises a 
plurality of cameras which are operably associated with 
the rear projection screen 46 of first station 12. In 

25 this regard, the plurality of cameras comprise a first 
camera head 62 and second camera head 64 which are 
operatively coupled to a first camera control unit 66 and 
second camera control unit 68, respectively. Notice that 
the first and second camera control units 66 and 68 are 

30 remotely situated from the first and second camera heads 
62 and 64. This facilitates permitting the first and 
second cameras 62 and 64 to be placed directly in the 
projection path of the rear projection screen 46, without 
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substantially interfering with the video image being 
projected. 

In the embodiment being described, the first 
camera head 62 and second camera head 64 are situated 
5 approximately 16 inches above the surface of table 50 
which generally corresponds to the eye level of the 
seated participants situated at table 50. As illustrated 
in Fig. 2, the first and second cameras 62 and 64 are 
situated behind the rear projection screen 46 in 
10 operative relationship with a pair of 1-1/4 inch diameter 
openings 66 and 68, respectively. The first and second 
cameras 62 and 64 are mounted on a suitable narrow or 
non-interfering bracket (not shown) such that they can be 
positioned behind the rear projection screen 46 in 
15 operative relationship with openings 66 and 68, 

respectively. In the embodiment being described, the 
first and second cameras 62 and 64 are 1-1/4 inch by 1- 
1/4 inch 3-CCD camera heads which generate images having 
an aspect ratio of about 3:4 and a picture resolution of 
20 about 494 x 700 pixels. One suitable 3-CCD camera heads 
62 and 64 and associated camera control units 66 and 68 
may be Model No. GP-US502 manufactured by Panasonic 
Broadcast and Television Systems Company of Japan. It 
should be appreciated that while the teleconferencing 
25 system 10 shown and described in relation to Figs. 1A and 
1B show image sensor means comprising a plurality of 
camera heads 62 and 64 and camera control units 66 and 68 
situated at a station, a single camera may be used (as 
shown and described relative to the embodiment shown in 
30 Figs, 4A and 4B) or even multiple cameras could be used 
depending on such things as the size of the station, the 
number of participants situated at the station, and/or 
the aspect ratio of each camera head selected. It should 
also be appreciated that the camera heads 62 and 64 and 
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associated camera control units 66 and 68 are configured 
and positioned at the first station 12 to facilitate 
providing maximum vertical eye contact among participates 
in the teleconference, while minimally interrupting the 
5 substantially life-size video projection on the rear 
projection screen 46. 

The conferencing means also comprises a first 
differentiator or differential key generator 70 (Fig. 1A) 
and a second differentiator or differential key generator 
10 72, respectively. The camera control unit 66 generates 
an RGB analog signal 1-62 which is received by the first 
differentiator 70, and the camera control unit 68 
generates an RGB signal 1-64 which is received by the 
second differentiator 72. The first and second 
15 differentiators 70 and 72 provide means for processing 
the image signals generated by the camera control units 
66 and 68 to remove or differentiate any undesired 
portion of the images corresponding to the signals 1-62 
and 1-64. For example, as described in detail later 
20 herein, it is desired in this embodiment to separate the 
image of the participants situated at the first station 
12 from at least a portion of the first predetermined 
sensory setting 12a, such as the background behind the 
participants, in order to provide a differential signal 
25 VS-1 that has that portion of the first predetermined 

sensory setting 1 2A removed. This, in turn, facilitates 
transmitting the video image of the participants at the 
first station 12 to the remote second station 14 and also 
facilitates compositing the image with other images, as 
30 described below. 

Suitable differentiators 70 and 72 may comprise 
the differential key generator shown and described in 
U.S. Patent No. 4,800,432, issued on January 24, 1989 to 
Barnett et al . and assigned to The Grass Valley Group, 
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Inc., which is incorporated herein by reference and made 
a part hereof. 

The differential key generators 70 and 72 
convert the 1-62 and 1-64 signals from RGB analog signals 
5 to digital image signals having corresponding images 104 
and 106 (Fig. 5A), respectively. The differential key 
generators 70 and 72 compare the digital image signals to 
an associated differential reference signals DRS-62 and 
DRS-64, respectively, which generally corresponds to 
10 images 108 and 110 in Fig. 5A. As described in detail 
later herein, these images 108 and 110 comprise at least 
a portion of the first predetermined sensory setting 12a 
such as the background. The differential reference 
signals DRS-62 and DRS-64 are stored in appropriate 
15 storage 74 and 76 (Fig. 1A) associated with the 

differential key generators 70, 72, respectively. In the 
embodiment being described, the differential reference 
signals DRS-62 and DRS-64 comprise a reference frame of a 
video image grabbed by one or both cameras 62 or 64 
20 situated at the first station 12 from a video sequence of 
the first predetermined sensory setting 12a of the first 
station 12 background where no participants, chairs, or 
$ other foreground elements are in place . 
: In response to the comparison, the first and 

25 second differentiators 70 and 72 generate differentiated 
video signals VS-1 and VS-2 (Fig. 1A), respectively- As 
illustrated in Fig. 5, the VS-1 and VS-2 signals 
generally correspond to the individuals situated at the 
first station 12 when viewed in the direction of arrow A 
30 in Fig. 2. As illustrated in the images 112 and 114 
(Fig. 5) associated with the VS-1 and VS-2 signals, 
... respectively, notice that the background area shown in 
images 104 and 106 has been removed and is tagged as a 
"zero" image area. 
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Advantageously/ tagging at least a portion of 
the image represented by the VS-1 signal as "zero" 
background facilitates compressing the VS-1 and VS-2 
signals and providing corresponding compressed CDS-1 and 
5 CDS-2 signals, thereby reducing the amount of 

transmission band width needed. This tagging also 
facilitates compositing or overlaying another 
predetermined image to provide a seamless composited 
image as described in detail below. 

10 The video signals VS-1 and VS-2 are received by 

a first compression/decompression means or CODEC 78 and a 
second compression/decompression means or CODEC 80, 
respectively. The CODECS 78 and 80 also receive an audio 
signal AS-A1 and AS-A2 from suitable microphones 82 and 

15 83, respectively, which may be positioned or concealed at 
an appropriate location in the first station 12, such as 
underneath or on top of table 50, as illustrated in Fig. 
1A. The function of the first and second CODEC 78 and 80 
is to compress video and audio signals for transmitting 

20 to remote stations, such as the second station 14, and 
also to decompress compressed video and audio signals 
received from remote stations. Consequently, the CODECS 
78 and 80 are configured with suitable compression and 
decompression algorithms which are known to those of 

25 ordinary skill in the art. The CODEC Model No. Rembrandt 
II VP available from Compression Labs, Inc. of San Jose, 
California is suitable for use in the embodiment 
described herein, but it should be noted that other 
suitable compression/decompression means may be employed. 

30 The CODEC 78 receives the video signal VS-1 and 

audio signal AS-A1 , and CODEC 80 receives the video 
signal VS-2 and audio signal AS-A2 . The CODECS 78 and 
80, generate digital signals CDS-1 and CDS-2, 
respectively, in response thereto which are in turn 
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transmitted to remote station 14 via a transmission 
network 84. 

The transmission network 84 may be configured 
as a private network, public circuit switch service, and 
it may utilize telecommunication and/or satellite 
technology. In the embodiment being described, the 
transmission network 84 preferably includes a plurality 
of T-l lines (not shown) which are capable of 
accommodating bit streams having a suitable band width, 
such as 1.544 megabytes per second. 

The teleconferencing system 10 and conference 
means associated with the first station 12 also comprises 
enhancing means for enhancing the resolution of an image 
or sub-image received from a remote station, such as the 
15 second station 14. In the embodiment being described, 
enhancing means comprises a first line doubler 86 and a 
second line doubler 88 which are operatively- coupled to 
the first CODEC 78 and second CODEC 80, respectively. In 
this embodiment, the first and second line doublers 86 
and 88 enhance the resolution and picture quality of at 
least a portion of the image corresponding to video 
signals VS-3 and VS-4 received from the CODECs 78 and 80, 
respectively, by about 50-150%. The VS-3 and VS-4 
signals correspond to images or sub-images received from 
25 remote station(s), such as station 14, as described in 
detail below. One suitable line doubler is the Model 
No. LD 100 available from Faroudja Laboratories, Inc. of 
Sunnyvale, California, but other suitable enhancing means 
may be provided to provide greater or less enhancement 
of the images to be displayed. For example, lenses, 
mirrors, optical pixel interpolation or other electrical 
means may be employed as desired. It should also be 
noted that the present invention may be performed without 
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the use of any enhancing means without departing from the 
scope of the invention. 

The first and second line doublers 86 and 88 
generate enhanced video signals which are input into 
5 compositing means, compositor or video compositing 

multiplexer 92 for compositing the enhanced video signals 
associated with the images or sub-images received from 
the remote station(s) with one or more predetermined 
composite signals, such as predetermined composite signal 
10 A, corresponding to a predetermined composite image or 
sub-image which are stored in a suitable storage device 
94 associated with the compositor 92. In the embodiment 
being described, the predetermined composite signal A 
corresponds to an image of at least a portion of first 
15 predetermined sensory setting 12a, such as the background 
of the first station 12. The video compositing 
multiplexer 92 composites the signals received from the 
first and second line doublers 86 and 88 with the 
predetermined composite signal A and generates a RGB 
20 analog composite signal in response thereto. It has been 
found that Model No. E-Space-1 available from Miranda 
Technologies, Inc. of Montreal and Quebec, Canada, is one 
suitable video compositing multiplexer 92. 

The teleconferencing system 10 comprises a 
25 projector 96 coupled to the video compositing multiplexer 
92 which receives the RGB composite signal and projects a 
corresponding image 90 (Fig. 1A) corresponding to the 
composite signal on the rear projection screen 46. The 
Model No. 3300 available from AMPRO Corporation of 
30 Titusville, Florida has been found to be a suitable 

projector 96. Although the embodiment has been described 
using projector 96 and rear projection screen 46, other 
suitable means may be employed for projecting or 
displaying the composited image. For example, a liquid 
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crystal display (LCD) or other electronic screen may be 
suitable to display images at a station. This may 
eliminate the need for the projector 96. 

The projector 96 could be used with an optical 
5 system or a plurality of mirrors (not shown), or prisms 
(not shown) such that the projector can be positioned, 
for example, to the side or below the rear projection 
screen 46 or in a manner that permits the projector 96 to 
project the image towards a mirror (not shown), which 
10 causes the image to be projected on the rear projection 
screen 46. 

As described in detail below, the composite 
signal and its corresponding image 90 generally comprise 
a video image of at least a portion of the first 

15 predetermined sensory setting 12a combined or composited 
with a differentiated image, such as an image of the 
participants from the second station 14 which correspond 
to the VS-3 and VS-4 (Fig. 1B) signals. Consequently, 
the resultant image 90 projected on screen 46 at the 

20 first station 12 complements or blends with the 

architectural motif, aura, theme or design defined by the 
first predetermined sensory setting 12a at the first 
station 12, such that the projected image 90 appears 
visually integrated with the first predetermined sensory 

25 setting 12a of the first station 12. This, in turn, 
causes any image of the participants situated at the 
second station 14 and included in the image 90 to appear 
to be face-to-face with participants at the first station 
12 during the teleconference. The operation of the 

30 compositor 92 is described in more detail later herein. 

It should be appreciated that the sub-images or 
images received from the remote station (s) typically have 
a resolution on the order of about 352 x 288 pixels and 
the predetermined composite signal A comprises a 
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resolution on the order of about 1280 x 1024 pixels. 
Thus, the resultant composite image 90 may comprise, for 
example, an image of the participants situated at the 
second station 14 having a first resolution and a 
5 background image of the first station 12 having a second 
resolution, which is higher than the first resolution. 
This enables compositor 92 to provide a composite image 
90 which, when displayed on screen 46, gives the illusion 
or effect of a M video mirror" to the participants 
10 situated at the first station 12. 

The teleconferencing system 10 also includes 
audio means comprising a plurality of speakers 100 and 
102 (Figs. 1A and 2) which, in turn, receive audio 
signals AS-B1 and AS-B2 from CODECS 78 and 80, 
15 respectively. It should be appreciated that the audio 
signal AS— B1 and AS-B2 generally correspond to the audio 
associated with the sound (e.g., voices, music and the 
like) associated with the remote station(s), such as 
second station 14. 
20 It should also be appreciated that the rear 

projection screen 46 and projector 96 are configured and 
selected to enable the teleconferencing system 10 to 
project the composited image 90 (Fig. A) at a 
predetermined scale, such as substantially full scale. 
25 In this regard, the compositor 92 comprises a scaler 95 
which is integral therewith for scaling the composited 
signal associated with the composited image 90 to a 
desired or predetermined scale, such as substantially 
full scale. 

30 Referring now to Fig. 1B, the second station 14 

comprises similar components as the first station and 
such like components are labelled with the same reference 
numeral as their corresponding component in the first 
station 12, except that the components associated with 
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the second station 14 have a M — 1 11 designator added 
thereto. Such components operate and function in 
substantially the same manner as described above with 
regard to the first station 12 with the following being 
5 some differences. The differential reference signals 
DRS-3 and DRS-4 (Fig. 5) associated with the second 
station 14 generally correspond to an image or sub-image 
of at least a portion of the second predetermined sensory 
setting 14a, such as the background 98-1, of the second 

10 station 14. Such sub-image or image may include at least 
a portion of the background 98-1 without any 
participants, chairs or other foreground subjects 
situated in the second station 14. Also, like the 
predetermined composite signal A stored in the storage 94 

15 associated with the first station 10, a predetermined 
composite signal B may be stored in the storage 94-1 
associated with the compositor 92-1 second station 14. 
The predetermined composite signal B may correspond to an 
image or sub-image of at least a portion of the second 

20 predetermined sensory setting 14a of the second station 

14. Such sub-image or image may include, for example, an 
image of the walls 32a-1 to 32h-1 and conference area 18 
or background of the second station 14. Notice that in 
the embodiment shown in Figs. 1A and 1B, the second 

25 station 14 has a second predetermined sensory setting 14a 
which mirrors or is complementary to the first 
predetermined sensory setting 12a. As described above, 
however, the first and second predetermined sensory 
settings 12a and 14a may be different. 

30 A method of operating the teleconferencing 

system 10 will now be described in relation to Figs. 6A- 
6D. The modular components, such as members 32a to 32h 
and ceiling 34 for first station 10, decorations and the 
like, are configured, assembled and decorated (block 99 



WO 96/09722 




PCT/US95/11802 



- 23 - 



in Fig. 6A) at a desired location to provide a conference 
station comprising a predetermined sensory setting 
defining a predetermined theme, motif or aura. As 
mentioned earlier herein, the theme, motif or aura may be 
5 complementary (as shown in Figs. 1A and IB) or they can 
be completely different, as shown in Figs. 3A and 3B 
(described below) . For ease of illustration, it will be 
assumed that the stations are assembled and decorated as 
shown and described relative to the embodiment in Figs. 
10 1A and IB. 

Once the modular stations 12 and 14 are 
assembled and decorated, it may be desired (decision 
point 101 in Fig. 6A) to use differentiator (e.g., 
differentiator 72 in Fig. 1A). As discussed herein 

15 relative to the embodiments shown in Figs. 4A and 4B, it 
may not always be desired to generate a differential 
reference image, thereby making it unnecessary to 
generate the differential reference signal. If 
differentiation is desired, then the camera heads 62 or 

20 64 generate at least one video image (block 103) of at 
least a portion of the first predetermined sensory 
setting 1 2A at the first station 12. The differentiators 
72 and 74 grab or capture at least one differential 
reference image or sub-image from those images and 

25 generate (block 107) the differential reference signals 
DRS-62 and DRS-64, respectively. These signals are 
stored in suitable storage 74 and 76 for use by the 
differentiators 70 and 72, respectively. Likewise, 
cameras 62-1 and 64-1 at the second station 14 generate 

30 video images of at least a portion of the second 

predetermined setting 14a at the second station 14. The 
differentiators 70-1 and 72-1 grab or capture at least 
one differential reference image or sub-image from those 
images and generate differential reference signals (not 
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shown) corresponding thereto. These signals are then 
stored (block 109) in suitable storage 74-1 and 76-1 for 
use by differential key generators 70-1 and 72-1, 
respectively . 

5 As mentioned above, it is preferred that the 

differential reference signals DRS-62 and DRS-64 comprise 
an image of at least a portion of the first predetermined 
sensory setting 12a, such as an image of the first 
station 12 without any participants, chairs or other 

10 subjects which are not stationary during the 

teleconference. Likewise, it is preferred that the 
differential reference signals associated with the 
differentiators 70-1 and 72-1 comprise at least a portion 
of the second predetermined sensory setting 1 4a at the 

15 second station 14, such as an image of the background 98- 
1 without the participants, chairs and other subjects 
which are not stationary during the teleconference. 

If differentiation of signals is not selected 
or at the end of the differentiation process, it may be 

20 desired to generate a composite image (decision point 97) 
for one or more of the stations. As discussed below, 
however, this may not always be required to achieve 
certain advantages of the invention. Such predetermined 
composite image would preferably include a substantial 

25 portion of the first predetermined sensory setting 12a, 

including the background and/or conference area 1 6 of the 
first station 12. If compositing is desired, then the 
predetermined composite signal A is generated (block 111 
in Fig. 6B). The corresponding predetermined composite 

30 signal A may then be stored in suitable storage 94. In 
the same manner, the predetermined composite image at the 
second station 14 and corresponding predetermined 
composite signal B may be generated and stored as 
predetermined composite signal B in suitable storage 94- 
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1 . In the embodiment being described, the predetermined 
composite image associated with the second station 14 
includes an image of at least a portion of the second 
predetermined sensory setting 14a, including the 
5 background 98-1 . 

In the embodiment being described, the 
predetermined composite signals A and B are generated by 
a suitable still camera (not shown) to provide a still 
image (not shown) of the station 12 or 14 being 

10 photographed. The still image would subsequently be 

scanned and digitized for storage by a suitable scanner 
(not shown). The still camera and scanner would 
preferably be capable of generating images having a 
resolution on the order of about 1280 x 1024 pixels. 

15 Thus, if compositing is performed, the resultant 
composite image (such as image 90 in Fig. 1A) may 
comprise an image having a high resolution background, 
for example, combined with a comparatively lower 
resolution image of the remote station participants. 

20 This, in turn, facilitates- enhancing the "video mirror" 
effect wherein a mimic or replication of a common 
architectural technique of mirroring a wall of a given 
room which makes the overall room appear to be extended 
beyond its actual wall line. 

25 Once the stations 12 and 14 are configured and 

the differential reference signals and predetermined 
composite signals A and B are generated and stored, the 
first and second suites 12 and 14 may then be 
teleconnected (block 113) or connected by satellite or 

30 other suitable means via the transmission network 84. 

Next, one or more participants may be situated 
at the first and second stations 12 and 14. As 
illustrated in Fig. 2, notice that the participants 
seated at the first station 12 are situated a 
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predetermined distance B from a participant's side 46a of 
the rear projection screen 46 . The predetermined 
distance B generally corresponds to a preferred or 
optimum focal distance at which optimum imaging by 
5 cameras 62 and 64 may be performed. In the embodiment 
being described, it has been found that the predetermined 
distance should be about 5 feet, 6 inches. The 
participants are situated at the second station 1 4 in a 
similar manner and the f ace— to-f ace teleconference may 

10 then begin. 

For ease of illustration, the imaging and 
display of first station 12 participants at the second 
station 14 will be described. The first and second 
cameras 62 and 64 capture (block 117 in Fig. 6B) live 

15 images of the participants situated at the first station 
12 and generate corresponding RGB analog signals 1^62 and 
1-64 which are received by the differential key 
generators 70 and 72, respectively. If differentiation 
was selected (decision point 147 in Fig. 6C), processing 

20 continues at block 119 otherwise it proceeds at block 

123. The differential key generators 70 and 72 generate 
(block 121 in Fig. 6C) the digital differential signal 
VS-1 and VS-2, respectively, after comparing (block 119 
in Fig. 6C) the 1-62 and 1-64 signals received from 

25 cameras 62 and 64 to their respective differential 

reference signals DRS 62 and DRS-64 which are received 
from storages 74 and 76. 

The differential signals VS-1 and VS-2 are then 
received by CODECS 78 and 80 which also receive the audio 

30 signals AS-A1 and AS-A2 which correspond to the audio, 
including sounds, music and voices, associated with the 
first station 12. The CODECS 78 and 80 digitize the 
audio signals AS-A1 and AS-A2, combine the audio signals 
with their respective video signal VS-1 or VS-2, and 
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generate (block 123) the compressed CDS-1 and CDS-2 
signals in response thereto. The CDS-1 and CDS-2 
signals are then transmitted (block 125) to the second 
station 14 via the transmission network 84 (Fig. IB). 
5 The CDS-1 and CDS-2 signals are received and 

decompressed (block 127 in Fig. 6C) by CODECS 78-1 and 
80-1, respectively, associated with the second station 14 
to provide decompressed VS-1 and VS-2 signals. The 
CODECS 78-1 and 80-1 also decompress the audio signals 
10 AS-A1 and AS-A2 received from the first station 10 which 
are transmitted to speakers 100-1 and 102-1, 
respectively, at the second station 14. 

Substantially simultaneously with the 
broadcasting of the audio signals at the second station 
15 14, CODECS 78-1 and 80-1 decompress the CDS-1 and CDS-2 
signals to provide VS-1 and VS-2 signals. The 
decompressed video signals VS-1 and VS-2 are then 
received by line doublers 86-1 and 88-1 . If it is 
desired to enhance the signals (decision point 129), then 
20 the line doublers 86-1 and 88-1 process or manipulate the 
signals (block 131) in order to enhance the resolution of 
the image corresponding to those signals. After the 
signals VS-1 and VS-2 are processed, it may be desired to 
composite (decision point 133 in Fig. 6D) those signals 
25 with one or more other signals. In this illustration, 

for example, the video compositor 92-1 composites images 
(block 135) corresponding to those signals with at least 
one predetermined composite image, such as image 122 
(Fig. 5B) corresponding to the predetermined composite 
30 signal B provided from storage 94-1 (Fig. IB) to provide 
a composite signal. As mentioned above, the composite 
signal generally corresponds to the composited image 91-1 
to be displayed on the rear projection screen 46-1 at the 
second station 14. 
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The compositor 92-1 may (decision point 137, 
block 139 in Fig. 6D) scale the composited image to a 
desired scale, such as full scale, using scaler 95-1 . 
Thereafter, the compositor 95-1 transmits a corresponding 
5 RGB analog signal to projector 96-1 which displays (block 
141) the scaled, composited image on the rear projection 
screen 46-1 (Fig. 1B). 

The teleconference may then be continued or 
terminated as desired (decision point 143, block 145). 

10 Because the composited image is substantially 

full scale when projected and includes a high resolution 
image of at least a portion of the second predetermined 
sensory setting 14a, the image appears to blend or become 
visually integrated with the second predetermined sensory 

15 setting 14a. This, in turn, gives the participants 

situated at the second station 14 the perception that the 
first station participants are present or face-to-face 
with them in the second station 14. 

In the same or similar manner, images and 

20 signals relative to the second station 14 images are 
captured, processed and displayed at the first station 
12. So that images of the participants at the second 
station 14 are displayed at the first station 12 such 
that they appear to have a face-to-face presence at the 

25 first station 12. Thus, images of the second station 14 
participants may be differentiated and composited such 
that, when they are displayed at the first station 12, 
the image completes or provides "the other half" of the 
first station 12 and becomes generally visually 

30 integrated therewith. Although not required, it may be 
desirable to enhance the face-to— face presence by 
providing, for example, first and second predetermined 
sensory settings 12a and 14a which define a dining 
environment wherein food or meals may be served. For 
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example, the face-to-face presence may be further 
enhanced if the participants at both stations 12 and 14 
order food and drinks from identical menus. Also, 
trained ma£tre-de and/or waiters may be used to actively 
5 promote the perception of a face-to-face dinner using a 
scripted dialog and interaction with remote participants, 
maitre-de and/or waiters. 

Once the teleconferencing is terminated, the 
stations 12 and 14 may be used by the same or different 

10 participants without the need to reconstruct or re- 
assemble the stations. 

Figs. 5A and 5B provide a visual illustration 
of the images corresponding to some of the signals 
described above utilizing the method and embodiment 

15 described above. In this regard, images 104 and 106 

generally correspond to the actual images captured by the 
first and second cameras 62 and 64, respectively. As 
described above, associated image signals 1-62 and 1-64 
are transmitted to the differential key generators 70 and 

20 72, respectively. The differential key generators 70 and 
72 compare the images 104 and 106 to the images 108 and 
110 associated with the differential reference signals 
DRS-62 and DRS-64 which are received from storages 74 and 
76, respectively, and which were previously generated by 

25 cameras 62 and 64 from an identical fixed camera 
position. 

As illustrated in Fig. 5A, the differential key 
generators 70 and 72 generate differential signals VS-1 
and VS-2 which have corresponding images 112 and 114. 
30 Notice that these images 112 and 114 comprise an image of 
the participants which are situated at the first station 
12 with the background area having been removed or tagged 
as a "zero" area. As described herein, this "zero" area 
becomes "filled-in" with the desired or predetermined 
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composite image which may include, for example, an image 
of at least a portion of the predetermined setting or 
background of the second station 14. It has been found 
that removing a portion of the image, such as the 
background, by tagging it as zero, in the manner 
described herein, facilitates compressing the signals VS- 
1 and VS-2 and reducing the amount of bandwidth needed to 
transmit the images over transmission network 84 and 
between the first and second stations 12 and 14. 

As mentioned above, the video signals VS-1 and 
VS-2 are fed into CODECs 78 and 80 which compresses the 
signals along with audio signal AS-A1 and AS— A 2 and 
generates signals CDS-1 and CDS-2. The CDS-1 and CDS-2 
signals are then transmitted, via transmission network 
15 84, to the second station 14 and received by the CODECs 
78-1 and 80-1 associated with the second station 14. As 
illustrated in Fig. 5B, the CODEC 78-1 and 80-1 
decompresses the CDS-1 and CDS-2 signals, respectively, 
from the first station 12 and feeds them into associated 
20 line doublers 86-1 and 88-1 . As mentioned earlier 
herein, the line doublers 86-1 and 88-1 facilitate 
enhancing the images associated with the video signals to 
provide enhanced video signals EVS-1 and EVS-2 (Fig. 5B), 
respectively . 

As stated earlier, the enhanced video signals 
EVS-1 and EVS-2 are then received by the video 
compositing multiplexer 92-1 associated with the second 
station 14 wherein the signals are combined to provide an 
intermediate composite signal ICS having an associated 
intermediate composite signal image 120 having an aspect 
ratio of about 8:3. 

The video compositing multiplexer 92-1 also 
receives the predetermined composite signal B having a 
predetermined composite signal B image 122 from storage 
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94-1 . The video compositing multiplexer 92-1 composites 
or combines the images 120 and 122 to generate the 
composite signal having an associated or corresponding 
composite image 124 as shown in Fig. 5B. As stated 
5 earlier, the predetermined composite signal B image 122 
generally corresponds to at least a portion of the 
predetermined setting or background of the second station 
14 and has an aspect ratio of 16:9. 

Notice that when the predetermined composite 
10 signal B image 122 is combined with the intermediate 
composite signal image 120, the video compositing 
multiplexer 92-1 causes the "zero" area of the 
intermediate composite signal image 120 to be "filled in M 
with the predetermined composite signal B image. 
15 The composite image 124 may then be scaled to a 

predetermined size or scale, such as full scale, using 
scaler 94-1, so that the composite image 1 24- may be 
scaled to a substantially full scale or real-life size 
image as desired. The composite image signal 
20 corresponding to the composite image 124 is transmitted 
to the projector 96-1 and then displayed on the rear 
projection screen 46-1 at the second station 14. As 
illustrated in Figs. 1B and 5B, the composite image 124 
may be appropriately framed or masked (such as with an 
25 archway 125 in Figs. 1B and 5B) when it is projected at 
the second station 14 to enhance the face-to-face, real 
time environment. 

The audio and video signals transmitted between 
the first and second stations 12 and 14 may be, in this 
30 illustration, transmitted over separate T-1 lines (not 
shown) in the transmission network 84 in order to effect 
a substantially simultaneous and/or "real time" video 
conference. Thus, in the illustration shown in Figs. 1A 
and IB, the participants may be geographically remotely 
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located, yet the participants situated at the first 
station 12 will feel as if the second station 14 
participants are located face-to-face or present with 
them at the first station 12, while the participants 
5 situated at the second station 14 will feel as if the 
first station participants are face-to-face or present 
with them at the second station. 

It should be appreciated that when the 
predetermined composite signal B and associated 

10 predetermined composite signal image 122 is composited 
with the intermediate composite signal and associated 
intermediate composite signal image 120, it overlays that 
signal to provide a seamless composite image 124, which 
facilitates reducing or eliminating the need to match up 

15 the borders or seams of the camera images with any high 
degree of accuracy. In this regard, it is preferable 
that cameras 62 and 64 and 62-1 and 64-1 preferably be 
situated such that they capture an entire participant 
rather than, for example, half of a participant. Thus, 

20 it may be desired to position the participants in a 

location such that any particular participants will not 
be in the field of view of more than one camera. 

Advantageously, the invention provides an 
apparatus and method for providing a video mirror at each 

25 station 12 and 14 which facilitates creating a face-to- 
face and non-interrupted image of any participants in the 
video conference. Because the image of the participants 
is differentiated, less transmission bandwidth, computer 
memory and the like is required. Also, the 

30 differentiators and compositors of the present invention 
enable a user to create a composite image 124 (Fig. 5B) 
having at least a portion thereof imaged at a greater 
resolution than the portion which was transmitted over 
transmission network 84. This facilitates reducing the 
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effect of limitations or transmission restrictions of the 
transmission network 84 which, in turn, facilitates 
increasing the quality of images displayed at a station. 
In addition, notice that the composite image 
5 124 (Fig. 5B) may have an aspect ratio which is different 
from the aspect ratio of the cameras 62 and 64. This 
enables the system and method of the present invention to 
utilize cameras which generate images having smaller or 
even larger aspect ratios. This also enables the system 
10 and method to use cameras having standard or common 
aspect ratios, such as 4:3. 

Figs. 3A and 3B, when taken together, 
illustrate another embodiment of the invention. The 
operation and components of the embodiment shown in Figs. 
15 3A and 3B are substantially the same as the operation of 
components of the embodiment described above relative to 
Figs. 1A and 1B with the same reference numerals being 
used for the same components with the addition of single 
prime (') designator. Consequently this embodiment is 
20 similar to the embodiment shown in Figs. 1A and IB, 

except that the second predetermined setting 14a' in Fig. 
3B and its associated theme, aura or motif is 
substantially different from the second predetermined 
setting 14a shown in Fig. 1B. In Fig. 3B, the first 
25 predetermined sensory setting 12a' comprises a plurality 
of decorations 120 defining the Chinese theme, motif or 
aura. Also, the predetermined composite signal A stored 
in storage 94-1 ' and the differential reference signals 
stored in storages 74-1 ' and 76-1 would generally 
30 correspond to an image of least a portion of that setting 
14a' . 

As with the illustration described above 
relative to Figs. 1A and 1B, the video and audio signals 
would be processed in substantially the same manner. In 
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general, an image of the participants situated at the 
first station 12' is composited by compositor 92-1' with 
a predetermined composite image of at least a portion of 
the second predetermined sensory setting 14a' of the 
5 second station 14' and projected onto the rear projection 
screen 46-1' at the second station 14'. The first 
station 12' participants appear to be face-to-face with 
the second station 14' participants because they have a 
relatively high resolution video image behind them which 

10 complements or becomes integrated with the second 

predetermined sensory setting 14a'. Thus, as shown in 
Fig. 3B, the image 91-1' (Fig. 3B) of the ladies at the 
first station 12' includes a Chinese background which 
blends or complements the actual predetermined sensory 

1 5 setting 1 4a' . 

Likewise, when the image of the participants 
situated at the second station 14' is projected on the 
rear projection screen 46' at the first station 12', they 
appear to be in the same room as the participants 

20 situated at the first station 12' because the 

Roman/ Italian video background which is seen behind the 
second station 14' participants generally complements and 
becomes visually integrated with the actual Roman/Italian 
theme, motif or aura defined by the first predetermined 

25 sensory setting 12' of the first station 12'. 

Figs. 4A and 4B, when taken together, 
illustrate another embodiment of the invention. The 
components of the embodiment shown in Figs. 4A and 4B 
which are substantially identical to the components in 

30 the embodiment shown in Figs. 1A and IB which have the 
same reference numerals with the addition of a double 
prime ("''") designators. As illustrated in Figs. 4A and 
4B, two remote modular stations such as stations 12'' and 
14'' may be provided and designed to have first and 
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second predetermined sensory settings 12a'' and 14a'' 
which are substantially identical. Thus, as shown in 
Figs* 4A and 4B, images may be captured in the manner 
described above at station 12' ' received by CODECS 78' ' 
5 and 80' ' and then transmitted, via transmission 84' to 
associated CODECS 78-1 ' ' and 80-1 '' , respectively. The 
CODECS 78-1 ' ' and 80-1 ' ' then generate a decompressed 
signal which may be enhanced by line doublers 86-1 9 ' and 
88—1'', respectively; scaled to an appropriate scale by 

10 scaler 95-1''; and then projected by projector 96-1'' 
onto rear projection screen 46-1 ' ' . 

Notice that the image comprising the second 
station 14'' participants and second predetermined 
sensory setting 14a' ' is displayed on screen 46'' at the 

15 first station 12''. Thus, this embodiment does not 

utilize the differentiating and compositing features of 
the previous embodiment, but may still achieve a face-to- 
face conference environment because the second 
predetermined sensory setting 14a'' is configured to be 

20 identical to or complementary with the first 

predetermined sensory setting 12a''. In this embodiment, 
entire images or sub-images of the stations 12 and 14 
(including images of both participants and background) 
are displayed at remote station(s). Because the stations 

25 12'' and 14'' are assembled, decorated and designed to be 
complementary or identical, they appear visually 
integrated to participants situated in the stations 12 
and 14. Accordingly , the first and second predetermined 
sensory settings 12a" and 14a M , including the background, 

30 are designed and arranged in a geometric fashion such 
that as cameras 62" and 64 M capture images of the 
participants, they also capture images of the first and 
second predetermined sensory setting 12a M and 14a M , 
respectively, at the most advantageous perspective for 
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display at the remote station(s). As with prior 
embodiments, this causes the first station 12' ' 
participants to perceive that the second station 14'' 
participants are situated or present with the first 
5 station 12'' participants at the first station 14' 

Likewise, the first station 12'' participants appear to 
be face-to-face with the second station 14'' participants 
at the second station 14' ' when the images associated 
with the first station 12'' are displayed on screen 46- 

10 1''. Consequently, by providing complementary or 

identical first and second predetermined sensory settings 
12a'' and 14a'', a face-to-face conference may be 
created- As with previous embodiments, it may also be 
desired to differentiate, enhance, composite or scale the 

15 images as described with previous embodiments, but this 
is not required with the embodiment being described. 

Thus, it should be apparent that stations can 
be provided with predetermined settings which are 
completely different, yet, by utilizing the apparatus and 

20 method of the present invention, the images of the 

participants in these stations may be projected at remote 
stations so that they appear to be virtually face-to-face 
with the remote station participants at or one more 
remote station. 

25 Various changes or modifications in the 

invention described may occur to those skilled in the art 
without departing from the spirit or scope of the 
invention. For example, the screen 46 for station 12 has 
been shown as being integral with a portion of a wall 32h 

30 (Figs. 1A and 2A), it could comprise a larger or smaller 
portion of that wall 32h, or it could be provided as part 
of one or more other walls, or even as part of the 
ceiling 34. 
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It should also be appreciated that while the 
embodiments have been shown and described comprising two 
stations, images from more than two remote stations may 
be displayed at a station, thereby permitting a 
5 teleconference convention among more than two stations. 
Although not shown, one or more of the 
compositors, such as compositors 12 or 12-1 (Fig. 1A) may 
comprise a stationary or moving image database (not 
shown) for providing a plurality of predetermined 

10 composite signals which define a particular or desired 

video background. For example, participants may elect to 
use the arched background of their proximity, choose an 
event-related scene, or decide to meet in a setting 
completely unrelated to their site or station. For 

15 example, a station having a Manhattan eatery motif may be 
provided with a screen configured as a window (not 
shown). Certain moving video backgrounds of a busy New 
York avenue may be deposited and displayed on the screen 
to give the illusion that the participants situated at 

20 the station are dining in a popular Manhattan eatery. 

It should also be appreciated that while the 
embodiments being shown and described herein refer to 
teleconferencing environments that have predetermined 
settings and motifs or auras relating to dining, the 

25 predetermined settings could define any type of aura, 

theme or motif which is suitable for video conferencing 
and in which it is desired to provide a "real-life" or 
face-to-face presence illusion. For example, the 
apparatus and method of this invention could be used in a 

30 business setting, education setting, seminar setting, 

home environment, religious setting, celebration setting 
(such as a birthday, retirement party, holiday or 
anniversary), or any other suitable setting as desired. 
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The above description of the invention is 
intended to be illustrative and not limiting, and is not 
intended that the invention be restricted thereto but 
that it be limited only by the spirit and scope of the 
5 appended claims. 

What is claimed is: 



» • 



10 
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1 • An image generator for use in a 

teleconferencing system comprising: 

a differentiator for comparing a differential 
reference image to an input video image from a station 
5 and for generating a differential image in response 
thereto; and 

a compositor associated with a remote station 
for receiving said differential image and for combining 
that differential image with a predetermined composite 
image to provide a composite image which may be displayed 
at the remote station. 

2 - The image generator as recited in claim 1 
wherein said differentiator comprises a differential key 
generator. 

3 - Th e image generator as recited in claim 1 
wherein said differential image generally corresponds to 
an image of subjects situated at the station. 

4 - Th e image generator as recited in claim 1 
wherein said differential reference image generally 
corresponds to at least a portion of the station. 

5 - The image generator as recited in claim 1 
wherein said predetermined composite image generally 
corresponds to at least a portion of said remote station 
in the teleconferencing system. 

6 - The image generator as recited in claim 1 
wherein the image generator further comprises: 

at least one CODEC coupled to said 
differentiator and said compositor for facilitating 
exchanging signals therebetween. 
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7 - The image generator as recited in claim 1 
wherein said image generator further comprises an image 
enhancer coupled to said compositor for enhancing the 
resolution of said composite image by a predetermined 

5 amount. 

8 - Tne image generator as recited in claim 7 
wherein said image enhancer is a line doubler. 

9 - The image generator as recited in claim 7 
wherein said predetermined amount is on the order of 
about 50 - 150%. 

10. The image generator as recited in claim 1 

wherein said compositor comprises a scaler for scaling 
the composite image. 

11 • The image generator as recited in claim 3 

wherein said subjects comprises at least one participant 
and at least one predetermined subject. 

12. The image generator as recited in claim 5 

wherein said portion comprises a background image. 
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13. 



A conferencing system comprising: 



a first station comprising a first sensory area 
defining a first aura; 

a second station comprising a second sensory 
5 area defining a second aura; and 

an image system for generating a first station 
image of at least a portion of said first sensory area 
and also for displaying said first station image at said 
second station such that said first and second auras 
10 become visually combined to provide an integrated face- 
to-face presence environment at said second station. 

14. The conferencing system as recited in claim 13 
wherein said first station image comprises at least one 
sub-image of predetermined subjects situated in said 
first sensory area. 

15. The conferencing system as recited in claim 13 
wherein said image system comprises: 

a compositor for compositing said first station 
image with a predetermined composite image to generate 
5 the composite image. 

1 6. The conferencing system as recited in claim 15 
wherein said compositor comprises a scaler for scaling 
the first station image. 

17. The conferencing system as recited in claim 15 
wherein said composite reference image comprises an image 
of at least a portion of said second sensory area of said 
second station. 
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18. The conferencing system as recited in claim 13 
wherein said imaging system further comprises a 
differentiator for comparing said first station image 
w lt h a differentiator reference image and generating said 

5 first station image in response thereto. 

19. The conferencing system as recited in claim 18 
wherein said differentiator comprises a differential key 
generator. 



20. The conferencing system as recited in claim 13 

wherein said conferencing system further comprises: 

an audio transceiver device associated with 
said first and second stations for exchanging and 
broadcasting audio signals between said first and second 
sensory areas. 

21. The conferencing system as recited in claim 13 
wherein said first and second sensory areas are 
complementary . 

22. The conferencing system as recited in claim 14 
wherein said predetermined subjects are at least one 
participant and a plurality of predetermined decorations. 
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23. The conferencing system as recited in claim 13 
wherein said image system further comprises: 

a differentiator for generating a differential 
signal by comparing a first station image signal 
5 generally corresponding to said image to a differential 
reference image signal; 

compositing means for combining said 
differential signal with at least one other signal to 
provide a display image for displaying at said second 
10 station. 

24. The teleconferencing system as recited in claim 
13 wherein said first and second auras are substantially 
identical . 

25. The teleconferencing system as recited in claim 
23 wherein said compositing means comprises: 

a compositor associated with said 
differentiator for compositing said image signal with a 
5 predetermined reference image to provide a composite 
image for displaying at either said first or second 
station . 
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26. An image system for use in a conference 

environment comprising a station having a first 
conference area and a remote station having a remote 
video area, said image system comprising: 
5 a compositor for compositing a first signal 

which generally corresponds to a video image of a portion 
of said first conference area with a composite reference 
signal to provide a composite image signal; and 

a display for displaying said composited image 
10 signal at said remote video area such that said first and 
second stations appear complementarily integrated. 

27. The image system as recited in claim 26 wherein 
said image system comprises: 

a differentiator for generating the first 
signal in response to a comparison of a differential 
5 reference signal to an input signal corresponding to an 
image of said first conference area. 

28. The image system as recited in claim 27 wherein 
said differential reference signal generally corresponds 
to an image of any desired subjects situated at said 
first conference area. 

29. The image system as recited in claim 27 wherein 
said differentiator is a differential key generator. 

30. The image system as recited in claim 26 wherein 
said image system further comprises: 

an audio transceiver device for exchanging and 
broadcasting audio signals between said station and said 
5 remote station. 
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31 ' The image system as recited in claim 26 wherein 

said compositor comprises a scaler for scaling the 
composite image signal. 

32. The image system as recited in claim 26 wherein 
said composite image signal corresponds to a composite 
image comprising a first image having a first resolution 
and a second image having a second resolution wherein 

5 said first and second resolutions are different. 

33. The image system as recited in claim 32 wherein 
said first image corresponds to a background and said 
first resolution is higher than the second resolution. 

34. The image system as recited in claim 26 wherein 
at least one of said station or said remote station is a 
modular construction. 



WO 96/09722 




PCT/US95/11802 



- 46 - 

35. A method for providing a virtual presence 
conference in a teleconferencing system having a first 
station and a second station comprising the step of: 

displaying an image formed from at least one 
5 sub-image from the first station at a predetermined 
location in the second station such that said image 
becomes visually integrated with said second station to 
define a single predetermined aura at said second 
station. 

36. The method as recited in claim 35 wherein said 
displaying step further comprises the steps of: 

differentiating between an actual image of said 
first station and a reference image to provide said at 
5 least one sub-image. 

37. The method as recited in claim 36 wherein said 
differentiating step further comprises the step of: 

storing an image of at least a portion of said 
first station as said reference image. 

38. The method as recited in claim 36 wherein said 
method further comprises the step of: 

using a differential key generator. 

39. The method as recited in claim 35 wherein said 
image is a composited image, said at least one of said at 
least one sub-image comprises a predetermined image, said 
displaying step further comprising the step of: 

5 compositing said predetermined image with said 

at least one sub-image to provide said image. 
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40. The method as recited in claim 39 wherein said 
compositing step further comprises the step of: 

scaling the image, 

41 . The method as recited in claim 35 wherein said 
method further comprises the step of: 

enhancing the resolution of said image . 

42. The method as recited in claim 39 wherein said 
enhancing step comprises the step of: 

using a line doubler to enhance the resolution 
of said image. 

43. The method as recited in claim 35 wherein said 
method further comprises the step of: 

displaying said image on a rear projection 
screen integrally associated with said second station. 

44. The method as recited in claim 35 wherein said 
method further comprises the step of: 

using a CODEC to facilitate exchanging images 
between said first and second stations. 
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45. The method as recited in claim 35 wherein said 

method further comprises the steps of: 

generating an audio signal associated with the 

image ; 

5 broadcasting said audio signal at said second 

station in general synchronization with said image. 

46 • The method as recited in claim 35 wherein said 

method further comprises the steps of: 

generating an image having a first image having 
at least a first resolution and a second image having at 
5 least a second resolution, said first and second 
resolutions being different. 

47 • The method as recited in claim 46 wherein said 

first resolution is higher than said second resolution, 
said first resolution corresponding to a background of 
said second station . 



4 8- A method for teleconferencing comprising the 

steps of: 

teleconnecting a first station having a first 
setting to a second station having a second setting; and 
5 displaying a composite image, including an 

image of at least a portion of said first station, at 
said second station such that when said composite image 
is displayed at said second station it cooperates with 
said second setting to facilitate providing a face-to- 
10 face presence environment at said second station. 
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49. The method as recited in claim 48 wherein said 
method further comprises the step of: 

comparing an actual image of said first station 
to a differential reference image and generating said 
5 image in response thereto. 

50. The method as recited in claim 49 wherein said 
method comprises the step of: 

using a differential key generator to generate 

said image. 

51 . The method as recited in claim 50 wherein said 
differential reference image comprises at least a portion 
of said first setting. 

52. The method as recited in claim 48 herein said 
method further comprises the step of: 

compositing said image with a predetermined 
composite image to provide a seamless composite image. 

53 Tne method as recited in claim 52 wherein said 

composite reference image comprises composite sub-images 
comprising a plurality of predetermined subjects. 

54. The method as recited in claim 52 wherein said 
compositing step further comprises the step of: 

scaling said composite image to a predetermined 

scale . 

55. The method as recited in claim 49 wherein said 
method further comprises the step of: 

compositing said image with a second reference 
image to provide a seamless composite image. 
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56. The method as recited in claim 53 wherein said 
plurality of predetermined subjects comprise the 
background of said second setting. 

57. The method as recited in claim 48 wherein said 
method comprises the step of: 

providing a first setting which generally 
complements said second setting such that when said 
> composited image is displayed at said second station, 

said first and second settings become visually integrated 
to a participant situated at said second station. 

58. The method as recited in claim 48 wherein said 
method further comprises the step of: 

using a CODEC to effect facilitate transmitting 
images between said first and second stations. 

59. The method as recited in claim 48 wherein said 
displaying step further comprises the step of: 

compositing said composite image to have an 
aspect ratio of at least 4:3; 

projecting said composite image in 
substantially full scale on a rear projection screen at 
said second station. 

60. The method as recited in claim 48 wherein said 

displaying step further comprises the step of: 

enhancing the resolution of said composite 

image . 

61 • The method as recited in claim 60 wherein said 

enhancing step further comprises the step of using a line 
doubler . 
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62 • A method for generating a seamless image at a 
station from a plurality of sub-images at least one of 
which is received from a remote station comprising the 
steps of: 

5 generating said plurality of sub-images; and 

combining said plurality of sub-images with a 
predetermined composite image to provide said seamless 
image. 

63 • Th e method as recited in claim 62 wherein said 
generating step comprises the step of: 

generating said plurality of sub-images using a 
plurality of image sensors. 

64. The method as recited in claim 62 wherein said 

combining step comprises the step of: 

providing a predetermined composite image which 
includes at least a portion of the background of said 
5 station. 

65 • The method as recited in claim 62 wherein said 

generating step further comprises the step of: 

differentiating between an actual image and a 
differential reference image in order to generate said 
5 plurality of sub-images. 

66 • Th e method as recited in claim 65 wherein said 

method further comprises the step of: 

using a differential key generator to generate 
said plurality of sub-images. 
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67. A method of generating an image for a video 

conference comprising the steps of: 

capturing an image at one station, 
filtering said captured image to provide a 
filtered image; 

compositing the filtered image with a 
predetermined composite image to provide a composite 
image; 

displaying the composite image at a remote 

station. 

68- The method of claim 67 wherein said captured 

image comprises a participant image of at least one 
participant and a background image , said filtering step 
comprising the step of: 
5 differentiating said captured image to separate 

the background image from the participant image to 
provide said filtered image. 

69. A video mirror system for use in a video 

conference, comprising a plurality of stations 
comprising : 
- a display; and 

5^ an imager coupled to said display for 

generating a superimposed image comprising at least a 
portion of one of said plurality of stations with a 
remote image of at least one participant from a remote 
station and also for causing said display to display said 
10 superimposed images. 



5 



10 



70. The video mirror system as recited in claim 69 

wherein said imager comprises a differentiator. 
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71 . The video mirror system as recited in claim 69 
wherein said imager comprises a compositor coupled to 
said differentiator. 

72. A method for generating a video scene at a 
station comprising the steps of: 

providing a first image; 

combining at least one remote image with the 
5 first image to provide a combined image; and 

displaying the combined image at the station to 
facilitate providing a predetermined aura at the station. 

73. The method as recited in claim 72 wherein said 
first image comprises a portion of the background of the 
station; said combining step comprising the step of: 

differentiating an image captured at a remote 
5 station to provide said remote image. 

74. The method as recited in claim 73 wherein said 
combining step of using a compositor to combine said at 
least one remote image with the first image. 

75. The conferencing system as recited in claim 13 
wherein at least one of said first or second stations 
comprise a modular assembly. 
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